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Final  Project  Report 
ARO  Fellowship 
DAAL0.V86-G-0O23 

Professor  Allan  Gottlieb 
New  York  University 

The  NYU  Ultracomputcr  project  has,  for  the  past  10  years,  been  studying  many  aspects 
of  highly  parallel  processing  on  shared  memory  MIMD  computers.  In  order  to  reduce  serial 
bottlenecks  for  such  computers  we  have  introduced  a  primitive  called  Fetch-and-add  that 
appears  to  be  especially  useful  for  coordinating  the  activities  of  large  numbers  of  cooperating 
processors.  In  our  proposed  architecture,  the  processor  to  memory  interconnection  network  is 
enhanced  to  combine  simultaneous  requests  directed  at  the  same  memory  location,  including 
the  important  special  ease  of  simultaneous  Fctch-and-adds. 

Our  overall  accomplishments  include  the  architectural  innovations  just  mentioned, 
numerous  algorithms  and  algorithmic  analyses,  the  construction  of  fully  functional  multipro¬ 
cessors,  two  generations  of  highly  parallel  operating  systems  and  other  system  software,  the 
production  of  parallel  programs  for  several  significant  (mostly  scientific)  applications,  and  the 
design  and  implementation  of  full-custom  VLSI  chips  for  combining  memory  references. 

During  the  three  years  that  the  fellowship  was  funded  (September  86  through  September 
89)  three  graduate  students,  Mr.  Wayne  Berke,  Mr.  Laurence  Kaplan,  and  Mr.  .liarui  Wang 
were  supported  For  Ultracomputer  related  research.  Berke  worked  on  compiler  and  run-time 
issues,  Kaplan  worked  on  I/O,  and  Wang  worked  on  hardware  engineering.  We  describe  their 
research  in  turn. 

1986-87  (Wayne  Berke) 

Mr.  Berke  was  previously  supported  on  an  earlier  ARO  grant  during  which  time  he  pro¬ 
duced  pdb,  a  debugger  for  parallel  programs.  This  debugger  has  proved  quite  useful  and 
was  described  in  the  final  report  for  the  previous  ARO  grant.  During  the  present  con¬ 
tract  period,  Berke  developed  a  parallel  FOR  I  RAN  environment  for  the  Ultracomputer 
multiprocessor  prototypes.  This  effort  required  a  sophisticated  preprocessor  as  well  as 
modifications  to  the  (existing)  FORTRAN  compiler.  Both  of  these  projects  were 
significant  software  efforts.  The  first  involved  design  and  implementation  of  a  complex 
program;  the  second  involved  understanding  and  changing  a  large  (and  poorly  con¬ 
structed)  existing  compiler.  The  resulting  system  has  been  enormously  successful.  It  is 
the  main  software  \  chicle  that  our  users  have  employed  to  write  parallel  programv 
These  users  range  from  very  experienced,  highly  trained  scientists  developing  state-of- 
the-art  numerical  codes  to  graduate  students  writing  their  first  parallel  program  for  i 
course  offered  at  NYU.  After  releasing  the  software,  Berke  was  very  responsive  to 
requests  for  enhancements.  The  system  quickly  became  stable  and  is  heavily  used  to  this 
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day,  with  no  further  maintenance  required. 

1987- 88  (Laurence  Kaplan) 

Mr.  Kaplan'1;  tenure  at  NYl '  was  unfortunately  limited  to  just  two  academic  years  I98n- 
88  due  to  circumstances  unrelated  to  his  responsibilities  as  an  ARC)  fellow  (which  began 
January  87).  Since  leaving  NYIJ  in  1988,  Kaplan  has  worked  for  BBN  on  system', 
software  for  their  well  known  Butterfly  parallel  processor.  While  an  ARO  fellow,  Kaplan 
was  responsible  for  much  of  the  I/O  development  in  our  laboratory.  This  included  prel¬ 
iminary  designs  for  parallel  I/O  for  future  Ultracomputers  as  well  as  the  pragmatic  issues 
concerned  with  I/O  support  for  the  workstations  on  which  we  performed  our  software 
development,  lie  was  comfortable  with  Unix  device  drivers,  which  are  traditionally  a 
rather  murky  area  shunned  by  beginning  students. 

1988- 1989  (Jiarui  Wang) 

Mr.  Wang  continues  as  a  graduate  student  in  our  laboratory;  subsequent  to  the  ARO 
support  he  has  received  an  NYU  fellowship.  Throughout  his  ongoing  tenure  at  NYI  . 
Wang  has  been  heavily  involved  in  the  hardware  design  of  Ultracomputer  prototypes,  lie 
has  been  a  primary  contributor  to  two  circuit  boards:  our  new  memory  module  and  our 
first  AMD  29000  processor  module.  In  addition  he  has  design  and  installed  significant 
modifications  into  an  existing  arbiter  module  that  enabled  our  old  bus  based  system  make 
use  of  our  new  network  and  memory  modules.  The  new  memory  modules  each  contain  4 
megabytes  implemented  using  1  megabit  DRAM  chips.  These  modules  are  dual  ported 
and  contain  on-board  refresh  circuitry.  In  addition  to  supporting  normal  loads  and' (par¬ 
tial  word)  stores,  the  modules  implement  the  NYU  fctch-and-phi  and  reflection  opera¬ 
tions.  These  modules  arc  in  production  use  and  perform  well.  The  AMD  29000  proces¬ 
sor  module  supports  letch-and-phi  and  interfaces  to  NYU  switches.  The  purpose  of  tins 
module  was  to  familiarize  ourselves  with  the  29000  since  it  is  to  be  the  processor  for  our 
next  generation  of  hardware  prototypes.  The  module  works  flawless  but,  by  design.  onl\ 
contains  essential  functionality.  A  significantly  enhanced  version  of  this  module  (includ¬ 
ing  caches  and  support  for  multiple  outstanding  memory  references)  is  currently  under 
development  by  Wang  and  others. 
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